DATA ANALYSIS AND VISUALIZATIONS

Based on the structure of the final dataset there is a set of questions which arise here.

What is the best way to compare testings?

Testing rate is a great way to compare countries by COVID testings, as it is calculated as number of tests per 100K of population. So, we can compare countries with different population level. Timeline below let us compare testing rate per countries at the different time slices.

Plot describes that during first few weeks Malta had the highest testing rate across all countries, probably because it has the smallest population rate. But most of the observed period Denmark was a leader by testing rate. This number increased obviously and the highest testing rate (more than 14000) observed at the 51th week of 2020 year.

Is there correlation between testing rate and positivity rate?

That is rather important question which let us decide if it is important to increase testing rate to define more positive COVID cases. To answer this question some manipulations with dataset have been provided. Columns used at this answer have been separated to another data frame. This new dataframe have been reshaped from long to the wide format. Also format of the date have been changed to weeks only, as all data cover the same 2020 year. That makes text at the x-axis more clear. First few row of the new dataset are described below.

we need to standerds the value of test rate and the postive

to make the data stander have the range from 0:1 we create stander function:

inputs :

ouputs:

As dataset is rather huge there is no chance to make conclusions based on the table format. Scatterplot below demonstrate correlations between testing rate and positivity rate by dates and countries. Plots are separated by facets with different scales to make plot interpretaion easier, as levels are quite different per countries.

to calculate the correlation for each counter we will define afunction

What is an association between new cases and deaths?

This question involves data along long period so, it is also reasonable to use visualizations for the answer. Columns needed for this answer have been organized to the separate dataset and reshaped to wide format, like for the previous answer. Head of the created dataset describes that both new cases and deaths are combined to the columns of the type variable-value. Year-week format also have been changed to make axis more readable.

Using facet barplot we can easy compare weekly cases and deaths by bountry and week of the year. Plot describes positive association, and we can estimate it numerically using Pearson correlation.

over all correlation coefficient describe strong positive assosiation between weekly cases and weekly deaths.

Which country descibes the highest number of deaths?

This answer supposes some initial calculations. To answer it we need to calculate total number of COVID-associated deaths along observed period. Based on the calculations below Italy describes the highest rate - more than 75 thousand cases for 2020. The lowest number of deaths observed for Malta.

To compare countries by their geographical positions we can use interactive map.

Which is the deaths variability across countries?

Previous question let us estimate total number of deaths across 2020 because of COVID pandemic. But it is also important to know what is variability in week deaths across described countries. For that we used boxplot of weekly deaths per countries.

As each country has different population level we estimated fatality rate - number of deaths per population level. Italy still keeps the leader position by median fatality rate and maximum fatality rate. High median levels of fatality observed for france,Belgium, Greece and Latvia also. The lowest median fatality belongs to Estonia, Denmark, Malta and Portugal have the lowest variability of fatality.

CONCLUSIONS

This project describes end-to-end data analysis of COVID-19 related data, including data uploading, filtering, reshaping, transformation and visualization. The main aspects of the project are relevant to the weekly data of 2020 statistics by 15 countries: Greece Malta,Italy,Portugal,Latvia,Denmark,Czechia ,Sweden ,Belgium,France, Netherlands, Slovenia, Estonia,Germany,Ireland
Along the study there were found that Denmark describes the highest testing rate along 2020 year. There were assosiation between testing rate and positivity rate some countiers is strong postive like Croatia some strong negative like Cyprus and some poor like Italy . like At the same time weekly deaths are positively associated with weekly new cases at the high level. Italy described the highest number of deaths along 2020 with more than 75k defined cases. It also has the highest fatality rate among 15 countries used for the analysis.

FUTURE WORK

All further steps relevant this project are oriented for the 2021 data analysis, including factors of vaccination. As different virus stamps have been defined during last month it is important to include this information to the further investigations of COVID-related data.

REFERENCES

  1. How ECDC collects and processes COVID-19 data. https://www.ecdc.europa.eu/en/covid-19/data-collection

  2. Sources - Worldwide data on COVID-19. https://www.ecdc.europa.eu/en/publications-data/sources-worldwide-data-covid-19

  3. Data on hospital and ICU admission rates and current occupancy for COVID-19. https://www.ecdc.europa.eu/en/publications-data/download-data-hospital-and-icu-admission-rates-and-current-occupancy-covid-19

  4. Data on testing for COVID-19 by week and country. https://www.ecdc.europa.eu/en/publications-data/covid-19-testing

  5. Data on the weekly subnational 14-day notification rate of new COVID-19 cases. https://www.ecdc.europa.eu/en/publications-data/weekly-subnational-14-day-notification-rate-covid-19